Peer-to-peer automated anonymous asynchronous file sharing

ABSTRACT

A service on a computer network that performs centralized searches based on index information transmitted by peer systems to a central site using an agent program running on each peer. Peer systems are directed to each other for the purpose of retrieving files. If none of the peers systems known to contain the files is online (and the file is therefore not available), the request is placed in a queue of file requests maintained by the central site. When a system containing the requested file connects to the service, the requested file is retrieved from that system and then distributed to the other systems which had requested the file.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.09/910,460 filed on Jul. 20, 2001 now U.S. Pat. No. 6,675,205, which isa continuation-in-part of U.S. application Ser. No. 60/219,983 filed onJul. 21, 2000, U.S. application Ser. No. 09/419,405 filed on Oct. 14,1999 now U.S. Pat. No. 6,516,337 and U.S. application Ser. No.09/575,971 filed on May 23, 2000.

BACKGROUND OF THE INVENTION

A number of file discovery and sharing programs have become very popularfor use across networks, especially those programs which permit thesharing of multimedia content. Users connect to a central directoryservice and upload a list of files that they currently have on theirlocal system which may be requested by other participants in thedirectory service. To retrieve files, users send a request for a file tothe central directory service which then connects the requesting user toanother user's computer containing that file which computer is alsocurrently online. The most popular program of this type is Napster, autility for sharing audio files by manually registering them with acentral directory service. Another popular program is Gnutella whichshares more general-purpose files. The general term for both programs isa “peer-to-peer file sharing service”.

An additional application which has been developed based on this modelis a distributed search engine. Operators of host computer sites wishingto permit searches register with the central directory service and thenanswer queries directed to them by that service. When a user performs asearch, the central service receives the request, compares the requestto information about the content of each host, and then transmits a copyof that request to all hosts which are able to satisfy the type of therequest. The search results subsequently received from these hosts arethen processed and sent to the requesting user. This is very similar tothe functioning of existing search engines except that the searches aredistributed to and performed by the individual hosts registered to adirectory service rather than by the central site. This approach iscommonly called a meta search engine.

SUMMARY OF THE INVENTION

Expanding on the above concepts, the invented system is a service whichperforms centralized searches based on index information transmitted bypeer systems to the central site using an agent program running on eachpeer, and then directs the peer systems to each other for the purpose ofretrieving files.

If none of the peer systems known to contain the file is online (and thefile is therefore not available), the request is placed in a queue offile requests maintained by the central site. When a system containingthe requested file connects to the service, the requested file isretrieved from that system and then distributed to the other systemswhich had requested the file. Files retrieved for systems not currentlyonline are held in a queue until the user connects or are emailed to theuser, usually as an email attachment. Or, when a computer systemcontaining the file connects to the central site, the file is sent bythe system containing the file either to the central site or directly tothe user who requested the file via email attachment.

The indexing and content reporting functions necessary for the serviceare performed by an individual copy of an agent program downloaded andinstalled by each peer system user. This agent program is described indetail in pending U.S. patent application Ser. Nos. 09/419,405 and09/575,971 by the same inventors which are hereby incorporated byreference. The indexing process on each system may be initiated manuallyor on a scheduled basis, with updates transmitted whenever the userconnects to the central service.

The agent is also responsible for transmitting copies of the requestedfile to the systems whose requests are waiting in the queue and pickingup copies of files from the queue it had previously requested.

Unlike competing prior art systems, this agent-enabled system is able tomaintain a central searchable index of the contents of the files, whichis always available to users whether or not the site reporting theinformation found in the index is on-line.

This invention has great application not only in the general Internetmarket, but also in intranet markets where many users maintain localcopies of files. It is also extremely useful for communities of userswho wish to exchange similar information, or for mobile users who arenot always able to be online at opportune times. This invention allowsusers to share files without having a web page.

This invention also allows the identity of each contributor of a copy ofa file to remain anonymous. Only the central server knows the internetaddress and other identifying information about each contributor, andthis information is stripped from each file before the file isforwarded.

This system also allows the sharing of files by systems which areprotected by a secure firewall. The firewall prevents computers on theinside from serving files in response to conventional requests from theoutside, but it allows the sending of an email with an attachment. Toallow operation of the invented file sharing system without compromisingthe firewall, the agent program is configured to behave as follows. Theagent reports to the central server the identities of files on thecomputer that will be provided if requested by others. When an emailrequest for a file is received by the agent from the central server, theagent generates an email in response, attaching the requested file ifthat file is still on a list of files that may be provided by the agent.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram of a conventional search engine forthe world wide web.

FIG. 2 is block diagram showing the architecture of a search engine foractively indexing the world wide web according to one embodiment of thepresent invention.

FIG. 3 is functional block diagram of the central server of FIG. 2.

FIG. 4 is a bubble chart illustrating the generation and processing of abrochure file in the indexing system of FIG. 2.

FIG. 5 is a bubble chart illustrating the process of the agent programin updating itself along with the local index generated by the agentprogram.

FIG. 6 is a bubble chart illustrating the process executed by the queuemanager of FIG. 3 in queuing update entries and transferring theseentries to the remote queue manager of FIG. 3.

FIG. 7 is a bubble chart illustrating the process executed by the updateprocess server of FIG. 3.

FIG. 8 is a bubble chart illustrating the overall data flow in thesearch engine of FIG. 3.

FIG. 9 is a functional block diagram of a distributed search engineaccording to another embodiment of the present invention.

FIGS. 10 and 11 are diagrams illustrating operation of a distributedaccounting and inventory system on an intranet according to oneembodiment of the present invention.

FIGS. 12-45 are figures illustrating components of the indexing systemof FIG. 2 for a Java-based implementation of the indexing systemaccording to one embodiment of the present invention.

DETAILED DESCRIPTION

This invention is preferably implemented as described in detail inpending U.S. patent application Ser. Nos. 09/419,405 and 09/575,971 bythe same inventors which are incorporated by reference.

A domain name service (DNS) maps names (domain names) to addresses(Internet Protocol(IP) addresses). Domain names are scarce and expensiveto obtain and maintain. A secondary DNS system could be built for thepeer-to-peer network using peer-to-peer agents and the central index.Content providers could choose names (agent names) and those name wouldbe associated with an agent indexing their site. Then, these names couldbe made known to others without providing the IP addresses, and the IPaddress can change and the content could still be found provided theagent name is not changed.

FIG. 2 is a block diagram of an indexing system 200 for activelyindexing the Internet according to one embodiment of the presentinvention. The system 200 includes a central server 202 that stores acentral index and processes search queries received over the Internetand also includes agent programs or agents 204 that reside on respectiveremote servers 208 and operate to provide periodic index updates to thecentral server 202, as will be described in more detail below. Thesystem 200 also includes brochure files or brochures 206 residing onrespective remote servers 208, each brochure file containing non-HTML orconceptual information about the Web site for use in generating thecentral index on the server 202, as will also be explained in moredetail below. For the sake of brevity, only two remote servers 208 andthe corresponding agents 204 and brochures 206 are shown in FIG. 2. Thesystem 200, however, includes numerous such remote servers 208, agents204, and brochures 208, as will be understood by those skilled in theart.

Each of the components in the central server 202 will now be describedgenerally, with these respective components being described individuallyin more detail below. The central server 202 includes a router 210 thatdirects packets comprising search requests and update transactionsthrough a load balancing switch 212 to an appropriate set of servers214, 230 and 222. The switch 212 balances traffic to all web servers 214to prevent overloading respective web servers and improve overallperformance of the central server 202. The router 210 also functions toallow offline updates of index server sets 216 and as a dispatch pointto prevent searches from being applied to an index server currentlyreceiving updates, as will be explained in more detail below. The webservers 214 receive and preprocess index queries and receive and processbrochure 206 generation or modification requests. In addition, the webservers 214 generate the parallel queries necessary to perform a searchusing the index servers 216. In one embodiment of the central server202, there are twenty web servers 214.

The central server 202 further includes a master index server 218containing a master copy of the entire central search index or catalog.In the embodiment of FIG. 2, the master index server 218 has a redundantarray of independent disks or RAID 5 to provide protection against diskfailures and loss of the central search index. In addition, the centralindex stored on the master index server 218 is also stored on a remotemaster index server 220 at a different physical location to providebackup of the central search index.

A number of update servers 222 each receive updates from the agentprograms and store the current version of the agent program for downloadand update of the local agent programs, as will be described in moredetail below. In addition, the update servers store the digitalsignature of the agent program and also store the remote web hosts' lastlocal index, which are utilized during the updating of the remote agentprogram and during updating the local index, as will also be discussedin more detail below. Each of the update servers 222 applies all indexchange transactions through a firewall/router 224 to the master indexserver 218 which, in turn, updates the central search index and thendistributes those changes to the various index servers sets 216. Themaster index server 218 also sends instructions to the NameSpace/Directory Server 233 to dynamically determine which set of indexservers 216 is to remain on-line to service search requests, and whichset is to receive the updates.

The central search engine 202 further includes a brochure databaseserver 226 and brochure check server 228. The brochure database server226 stores a brochure database as a list of brochures and theirassociated data fields for each web site. The web servers 214 mayrequest records from or add records to this brochure database dependingon the actions taken by web site administrators while maintaining theirbrochure entries. The brochure check server 228 periodically checks forvalid new brochures as defined within the brochure database server forweb sites that are not being processed by a local agent program, as willbe described in more detail below. If the defined brochure in thebrochure database server 226 is not found by the brochure check server228, a notification is sent to the administrator of the site where thebrochure was supposed to be found.

When a brochure file is requested for a site which is not served by anagent 204, a message is sent to the Internet Service Provider (“ISP”) orsystem administrator for the site hosting the web site, indicating thatusers of the system are requesting brochures. This server alsoperiodically checks the validity of existing brochures on all sites andnotifies the web site administrator if a brochure file is missing. If abrochure is missing and remains missing for a given number of checkcycles, the brochure check server 228 sends a request to the brochuredatabase server 226 to delete the entry for the brochure. The brochurecheck server 228 detects any changes in brochures, such as additions orremovals, and converts these changes to transaction batches that areforwarded to a queue manager which, in turn, applies these changes toupdate the central index on the master index server 218, as will bedescribed in more detail below. The brochure check server 328periodically verifies the status of all brochures at sites that are notbeing indexed by an agent 204.

The components of the central server 202 and their general operationhave been described, and now the operation of the agent 204 and brochure206 will be described in more detail. The agent 204 and brochure 206 mayboth be present at a remote server 208. A brochure 206 and agent canfunction independently of each other, as will be discussed in moredetail below. The agent 204 is a small local program which executes atthe remote server 208 and generates an incremental search engine updatefor all of the participating web sites on the web host 208. These indexupdates are transmitted by the agent 204 to the central server 202,where they are queued for addition to the central index.

The agent 204 runs on a system, such as a web host server, at the siteof an organization, and processes content (objects) for all web sitesavailable via mass storage from that system. The agent 204 processes allweb sites located within the mass storage area to which it has access,unless configured to exclude some portion of a site or sites. The agent204 uses the local web server configuration (object catalog or filesystem information) data to determine the root directory path (or otherlocation information for the particular file system) for all web sitefile structures available. The agent 204 reads files directly from localmass storage, and indexes the keywords from the files and meta dataabout the files. In contrast, a spider program, as previously discussed,is located on a server remote from the local site and renders each webpage file before tokenizing and parsing each page for indexing. Theagent 204 follows the structure of the local mass storage directory treein indexing the files, and does not follow uniform resource locators(“URLs”) stored within the HTML files forming the web pages. Since theagent 204 is present at the remote server 208 and has access to filesstored on the server's mass storage, the agent is potentially capable ofretrieving non-html data for indexing from these locally stored files,such as database files and other non web-page source material. Forexample, a product catalog stored in a database file on the remote massstorage may be accessed and indexed by the agent 204.

While indexing the web sites at the remote server 208, the agent 204recognizes brochures 206 stored at web sites on the server and providesindex updates based on the contents of the brochures found. Once theagent 204 has indexed the web sites at the remote server 208, the agenttransmits a transaction list to the central server 202, and thistransaction list is stored on one of the update servers 222. Thetransaction list is referred to as a batch, and each batch contains aseries of deletion and addition transactions formatted as commands. Morespecifically, each batch represents an incremental change record for thesites at the remote server 208 serviced by the agent 204. The updateserver 222 thereafter transfers each batch to the master index server218 which, in turn, updates the master index to reflect the indexchanges in the batch. It should be noted that the agent 204 transmitsonly “incremental” changes to the central server 202. Conversely, aconventional spider program requests the entire rendered HTML page fromthe remote web site via the remote server 208, and then parses thereceived page for keyword information.

The brochure 206 is a small file that may contain conceptual and othernon-HTML information which would be useful to improve the indexing ofsites or parts of a single site on the remote server 208. A brochure 206may contain any information pertinent to the web site, including but notlimited to keywords, phrases, categorizations of content, purpose of thesite, and other information not generally stored in a web page. Thebrochure 206 is generated manually by individual web siteadministrators. The administrator fills out a form at the central server202, and receives an email containing the brochure 206 or downloads thebrochure after submitting the form contents. Upon receiving the brochure206, the administrator stores it within the file structure of the website on the remote server 208. There may be multiple brochures 206 atthe same web site, each describing specific portions of the site. Eachbrochure 206 may refer to a single web page or a group of web pagesstored within a specific subdirectory at the web site. All informationstored in each brochure 206 is applied to the pages referenced in thebrochure.

The overall operation of the central server 202 will now be described inmore detail with reference to the functional block diagram of FIG. 3. InFIG. 3, many components previously discussed with reference to FIG. 2are shown, and for the sake of brevity the detailed operation of eachsuch component will not again be described in detail.

In operation, the central server 202 performs three primaryfunctions: 1) processing search queries from remote users; 2) brochuregeneration and verification; and 3) index update processing. Inprocessing search queries from remote users, the Web servers 214 receivesearch queries from remote user browsers. A router, which corresponds tothe routers 210 and 212 in FIG. 2, directs the search query to theappropriate web server 214. The web server send the query to a QueryProcessor 234 which parses the query and sends it to the available indexserver set 216 or 217 as listed in the Name Space Server 233 forappropriate segment of the index. The selected index server sets 216 or217 thereafter return search results to the query processor in responseto the applied search query, and these search results are sent to theWeb server 214, which, in turn, returns the search results to the remoteuser browser.

The central server 202 also allows remote users to generate and downloadbrochures 206 to their remote site, and also verifies the validity ofbrochures 206 on Web sites not serviced by an agent 204, as will now beexplained in more detail. The Web servers 214 receive and processbrochure 204 generation or modification requests from user browsers.Once the brochure 204 has been generated or modified, the brochure istransferred to the brochure database server 226, which stores allexisting brochures. The brochure check server 228 periodically checksfor new brochures 206 stored on the brochure database server 226 for Websites that are not served by an agent 204. When a brochure 206 isrequested for a Web site which is not served by an agent 204, thebrochure check server 228 sends a message to the system administrator orInternet service provider for the server hosting a Web site telling themthat site administrators on their server are requesting brochures 206.The brochure check server 228 also periodically verifies the validity ofexisting brochures 206 on all sites not serviced by an agent 204. If abrochure 206 is missing for a predetermined number of verificationcycles, the brochure check server 228 instructs the brochure databaseserver 226 to delete the entry for that brochure. The brochure checkserver 228 also converts any modifications, additions, or deletions tobrochures 206 to transaction batches, and forwards these transactionbatches to a queue manager 302. The queue manager 302 receives brochureupdate transaction batches from the brochure check server 228 and alsoreceives agent update transaction batches from the agent update server222, as will be described in more detail below.

The central server 202 also performs index update processing to updatethe central index stored on the master storage server 218 and thesegmented central index stored on the index servers 216, 217, as willnow be described in more detail. As described above, the queue managerreceives update transaction batches from the brochure check server 228and the agent update server 222. The agent update server 222 receivesqueries from the agent as to the current state of the agent's versionand the status of the last index updates of the site. If the agent isnot of a current version, a current version is automatically transmittedand installed. If the state of the site indexing is not consistent asindicated by a match of the digital signatures representing state of thesite and the state of the central index the last time an update wasreceived and successfully processed and added to the central index, thenthe agent will roll back to previous state and create the necessaryadditions and deletions to the state of the site and the central indexinto agreement. The agent 204 will then sent the additions and deletionsalong with a current digital signature to the queue manager 302 Thequeue manager 302 receives incremental index updates from the agents 204present on the remote servers 208 and converts these updates into updatetransaction batches which, in turn, are transferred to the updateprocessing server 306. The queue manager 302 stores the received updatetransaction batches, and periodically transmits a copy of the storedtransaction batches to a remote queue manager 304 for processing byupdate processing server 306 and being applied to the remote masterstorage server 220. The queue manager 302 also periodically transmits acopy of the stored transaction batches to and update processing server306. The queue manager 302 stores update transaction batches receivedfrom the agent 204 during a predetermined interval, and upon expirationof this interval the update batches are transferred to the updateprocessing server 306. Upon receiving the update transaction batches theupdate processing server 306, applies all the batches to update thecentral index stored on the master storage server 218. Once the centralindex stored on the master storage server 218 has been updated, themaster storage server 218 applies the update transaction batches throughthe router to update the segmented central index stored on the indexserver sets 216, 217.

During updating of the segmented central index stored on the indexserver sets 216, 217, the update transaction batches are directed toonly one set of index servers 216, 217 while the other set remainsonline to handle search queries, and thereafter places the updated setof index servers 216, 217 online and updates the set previously online.For example, assume the index servers 216 are the primary set of indexservers and the servers 217 are the secondary set. Each index server set216, 217 can contain all or a portion of the central index 218. As seenfrom the above example, the primary and secondary index server sets 216and 217 eliminate the need for record locking of the segmented centralindex to which search queries are applied. Thus, all records of thesegmented central index are always available for search queries.Moreover, if one server of the primary index server set 216 or 217fails, the remaining servers of that set will continue to serve queries.If the entire server set fails, the corresponding secondary index serverset is made the primary so that the entire segmented central index isavailable for applied search queries. It should be noted that in theunlikely event that both the primary and secondary index server sets216, 217 for a particular segment of the central index simultaneouslyfail, the remaining segments of the central index remain available forapplied search queries, and only the segment of the central index storedon the failed index servers becomes unavailable. In other words, searchqueries are still applied to the vast majority of the central index sothat reasonable search results may are still obtained. In a case wereboth server sets fail, queries for the segment that had failed could besent to central index.

The index server set or sets are used to provide query results forsearches submitted by the Web Servers. Each set of servers is identical,and each set of servers contains a portion of the overall index.Initially, the division will be alphabetical and numerical, for a set of36 servers. Server “A” would contain the index for all words beginningwith “A”. Only one set of servers is updated at a given time, while theother set remains on-line to service search requests. This permits thesystem to be run without file-locking constraints and allows for failover should a server become inoperative.

FIG. 4 is a bubble chart illustrating the generation and processing of abrochure 206 in the indexing system 200 of FIG. 2. As previouslymentioned, the purpose of the brochure 206 is to allow the web host 208and the web site to provide specific non-HTML information, which willhelp the central server 202 in indexing the site and in order to providemore relevance to query results. The brochure 206 can be created in twoways. First, as part of the installation program for the agent 204, theadministrator of the remote server 208 completes a form that isconverted to an encoded brochure file 206, and then copied into the webdirectory on the remote server 208. This method of generating thebrochure 206 will be discussed in more detail below. The second methodof generating the brochure 206 utilizes a brochure creator interface onthe web servers 214 at the central server 202. This method will now bedescribed in more detail with reference to FIG. 4.

To create a brochure 206 using the brochure creator interface, a user'sbrowser 400 applies a brochure generation request 402 to the associatedcentral site web server 214. In response to the request 404, thebrochure creator interface generates a form which the user completes,and then sends a brochure request 406 to the brochure server 226, whichgenerates an encoded brochure file that is then sent to the central siteweb server 214. The central site web server 214 then sends the encodedbrochure file to the user's browser 400. The encoded brochure file 206is then stored in local storage 408. Subsequent to receiving the encodedbrochure file 206, the user sends the encoded brochure file 206 via theuser's web browser 400 to the web host site storage 410 (e.g., the website host computer).

The brochure server 226 stores the brochure data 407 in a brochuredatabase 424 on the central server 202 once it has been generated as aresult of a brochure generation request 404. To verify proper storage ofencoded brochure files 206, the brochure check server 425 retrievesbrochure data 420 from the brochure database 424 and sends a request 416to the web host server 404 to retrieve the encoded brochure file 206from the web host site storage 410. Upon successful retrieval of thebrochure file 206, the brochure check server generates and transmitsobject references 422 created as a function of the brochure data 420 tothe queue manager 302. The queue manager 302 thereafter updates thecentral index to include the generated object references.

The directory structure of the host and web site are used to determinethe relevance of the information in the brochure. Information in abrochure located the root directory will apply to all sub-directoriesunless superceded by another brochure. Information in a directorybrochure will apply to all subdirectories unless superceded byinformation in a subdirectory brochure. Where a brochure is placeddetermines for which content the information applies. A web site ownercan have as many brochures as there are pages or directories in hissite. A site owner can request that their site be excluded from theIndex by checking the EXCLUDE box next to the URL and copying thebrochures into the directory to be excluded.

The host uses the configuration section of the agent program to createsite brochures, and can create site brochures for an entire IP addressor for any subsection of the site.

In addition to the host brochure, a web site owner may also place a sitebrochure on his web site. The purpose of the site brochure is to allowthe web site owner to provide specific conceptual or non-htmlinformation, which will help in indexing their site.

The web site owner can create a different site brochure for each page ordirectory on the site. For example, if the web site includes pages indifferent languages, the web site owner should create a site brochurefor each language with keywords and categories that match the language.Once the web site owner has filled in the brochure form, they will clicka button on a web page from the web server at the central server, and aweb server creates an encoded html file that is then sent or downloadedto the site owners computer. Each encoded brochure file could be given aparticular name, such asbrochure—domainname-com-directory-directory-directory.html, and the siteowner is instructed to copy the encoded file into the specified webdirectory on the site.

At anytime, the web site owner can visit the central server site, updatetheir brochure, and download a new encoded brochure. When updating anexisting brochure, the current brochure information for the URL enteredwill be displayed to reduce input time. Any site brochure will supercedethe host brochure information, and information contained in the sitebrochure will be assumed to be more current and accurate and will beused by the agent for indexing purposes. A site brochure that is fartherdown in the directory tree from the root directory will supercede a sitebrochure that is above it in the directory tree. A site owner canrequest that their web site be excluded from the index by checking theEXCLUDE box next to the URL and copying the brochures into the directoryto be excluded.

If the host or web site URL is not currently being indexed, the webserver performs the following operations. First, an automatic email issent to contacts at the host to encourage the host to install the agent.An automatic email is also sent to a contact person for the web sitewith a “Thank You” and a request that they ask their host to install theagent. In addition, a retrieval order is generated for the centralserver to retrieve the brochure file from the web site in one hour. Ifthe retrieval order is unsuccessful, it will be repeated 2, 4, 8, 24 and48 hours later, until successful. If still unsuccessful after 48 hours,the retrieval order is canceled. By verifying the presence of the sitebrochure in the specified location, unauthorized information about asite may not be created by a third party in an attempt to have theirsite indexed along with a more popular site. This is a common problemwith existing search engines where a third party copies the keywordsfrom a meta tag in a popular site. The bogus site with copied keywordsis then submitted to a search engine for indexing, and when searchqueries are applied to the search engine that produce the popular sitethe bogus site is also produced. This may not be done with the sitebrochure because the brochure is not an html page available to outsidepersons and because it is encrypted so even if the file is obtained theinformation contained therein is not accessible.

Software to create brochures and agent programs will be distributed freeto software publishers for inclusion in their web authoring software andto web server manufactures, publishers and OEMs for pre-loading on orinclusion with their products.

FIG. 5 is a bubble chart of the process executed by the agent 204according to one embodiment of the present invention. As previouslymentioned, the agent 204 periodically executes the illustrated processto update itself and to update the corresponding local index, as willnow be described in more detail. The process begins in step 500 in whichthe agent verifies that it is the most current version of the agentprogram. More specifically, in step 500 the agent sends a request 502 toone of the update servers 222 for the digital signature of the currentversion of the agent program. The update servers 222 returns the digitalsignature 504 for the most current version of the agent. In step 500,the digital signature hash of the local agent is compared to thereturned digital signature hash to determine whether the local agent isthe most current version. In other words, if the two digital signaturesare equal, the local agent is the most recent version, while if the twoare not equal the local agent is an outdated version of the agentprogram and must be updated. When the two digital signatures areunequal, the program goes to step 506 in which the most current versionof the agent program 508 is received from the update server 222. Oncethe local agent program has been updated, the program proceeds to step510. Note that if the digital signature of a local agent program isequal to the digital signature 504 of the most recent version of theagent, the program proceeds directly from step 500 to step 510.

In step 510, the agent program compares the digital signature hash forthe existing local index previously generated by the agent program tothe digital signature hash stored on the central server 202 for theexisting local index. The agent program performs this step tosynchronize the local index and the remote local index stored on thecentral server 202 by ensuring the digital signature of the existingversion of the local index matches the digital signature for theexisting version of the remote local index. If the two digitalsignatures are equal, the agent program goes to step 512 and generatesand updated local index by evaluating, such as by tokenizing andparsing, local files 513 on the web host serviced by the agent. Once theupdated local index has been generated, the agent program proceeds tostep 514 where the updates along with the digital signature of the newlocal index are transferred to the agent update server 222 on thecentral server 202.

If step 510 determines the two digital signatures are not equal, theagent program goes to step 516 to roll back to a previous state thatmatches the local files 513 or to generate a completely new local indexfor the web host serviced by the agent. After the complete new localindex is generated, the agent program once again proceeds to step 514and the updates are transferred to the agent queue manager 302. Aspreviously mentioned, comparing the digital signatures in step 510synchronizes the local index and remote local index. Furthermore, thisstep enables the agent program to rebuild a completely new local indexfor the site serviced by the agent program in the event the index islost at the central server 202. Thus, should the central server 202crash such that the central index is corrupted and non-recoverable, theagent programs at the each remote web host will rebuild their respectivelocal indices, and each of these local indices will be transferred tocentral server 202 so that the entire central index may bereconstructed.

As mentioned above, the agent 204 is a software program that a web hostdownloads from the web servers 214 and installs on the host's server. Toinstall the agent 204, the host runs an agent installation program,which collects information about the web site host and about the siteitself, and also creates the web site host's brochure 206 of non-HTMLinformation. As part of the installation, the site host schedules apreferred time of day for the agent 204 to automatically index the website and transfer index updates to the central server 202. The agent andthe queue manager can work independently or together to reschedule whento perform and transmit the site update. Resource availability is theprimary and any other factor, which may effect the quality or efficiencyof the operation may be used by the agent and the queue manager inrescheduling updates.

In the preferred embodiment the agent 204 initiates all communicationswith the central server over a secure socket authorized and setup by thesite host. But the central server 202 could also initiate communicationsor trigger actions of the agent or retrieve data process by the agent.All data and program updates sent between the site host and the centralserver are sent in compressed and encrypted form. During the normalindex updating process, the agent 204 is automatically updated, as willbe explained in more detail below. The site host may receive a dailyemail saying the site had been properly updated or that no update wasreceived and no action is required. The agent 204 also maintains a logof indexing activity and errors encountered, and this activity log canbe viewed by the site host by opening the agent 204 and accessing thelog. Although the agent 204 automatically indexes the sites on the hostat scheduled times, the host can at anytime initiate an indexing updateby opening the agent 204 and manually initiating an index update.

In operation, the agent 204 verifies that the agent program is currentand that the site index matches the last update received andsuccessfully added to the central index on the central server 202. Afterverification and updating of the agent 204 if required, the agent checksthe site for new, modified or deleted files. The new or modified filesare indexed and the information added to or deleted from the site indexor a list of additions and deletions transactions are created. Theincremental changes to the site index along with a digital signature ofthe entire site index are sent to the central server 202 and the resultslogged in a site activity log maintained by the agent 204. The agent 204is run by either being manually started by the site host orautomatically started by a scheduler component of the agent.

It is not necessary that a local index be maintained at the site butonly that a list of digital signatures representing the site at the timeof the last update be maintained. The digital signature could be used todetermine whether the local site and the central index are properlysynchronized and which portion of the site had changed since the lastsuccessful update. Then instructions to delete all references from thecentral index 218 to files located at the web host that have changed orwhich no longer exist would be sent by the agent to the queue manager.New references would then be created for all new or modified files andwould be sent by the agent to the queue manager as additions to thecentral index 218.

The process executed by the agent 204 will now be described in moredetail. The agent 204 first checks with the central server 202 for thecurrent version of the agent program. More specifically, the agent 204calculates a digital signature of the agent program files and contactsthe central server 202 over a secure socket. The agent 204 then requestsa digital signature of the current version of the agent program fileslocated at the central server 202, and compares the two digitalsignatures. If the two signatures match, the version of the agent 204 iscurrent and no update is required. When the two signatures do not match,the current version of the agent 204 is downloaded from the centralserver 202. Once the current agent 204 is successfully downloaded, thenew agent program files are installed and the agent restarted.

At this point, the agent 204 begins the process of updating the index ofthe local site. First, the agent 204 determines whether the last indexupdate was completed and transmitted successfully. If not, the agent 204renames the Old—Site-Index file to Site-Index and the Old-Site-File-Listto Site-File-List. The agent 204 then calculates a digital signature forthe Site-Index file and a signature for the Site-File-List file andcompares each to the digital signatures created at the end of the lastsuccessful update for Site-Index and Site-File-List files. If thedigital signatures match, the agent 204 sends them to the central server202 for comparison and waits for confirmation.

If the central server 202 does not confirm the match of the digitalsignatures (i.e., the signatures for the Site-Index and Site-File-Listfiles on the central server 202 do not match those on the remote site),the agent 204 deletes the Site-Index and Site-File-List files, andnotifies the central server 202 to delete all site records. Next, if theagent 204 was updated and Fields were added or deleted from the SiteIndex file, then the agent updates the Site-Index file to include theupdates. The agent 204 then determines if the Site-File-Lists fileexists, and renames the Site-File-List file to Old-File-List and createa text file named Site-File-List. If no Site-File-List exists butOld-File list exists, the agent 204 copies the Old-File-List file toSite-File List. If no Site-File-List and no Old-File-List files exist,the agent 204 creates a text file named Site-File-List. The agent 204then calculates a digital signature hash for each file on the site andthe host brochure and records the file name including full path anddigital signature hash of all files.

If the central server 202 verifies that the digital signature hash ofthe Site-Index file and the digital signature hash for theSite-File-List file match, the agent 204 verifies the brochure files.More specifically, the agent 204 determines if the file brochure.htmlfile name does not match the directory in which it is located. If thefile brochure.html is not in the expected directory, the agent 204 sendsa warning email to the site contact listed in the brochure, and thenrenames brochure.html to WrongDirectorybrochure.html.

If the agent 204 determines that all brochure.html files match thedirectory in which they are located, the agent 204 deletes a file namedExclude-File-List, creates a text file named Exclude-File-List, checksbrochures for EXCLUDE sites flags, and adds file names of files to beexcluded from the index to the Exclude-File-List file. The agent 204then creates a Deleted-File-List file containing a list of files that nolonger exist on the site in their original location. More specificallythe agent 204 deletes the old Deleted-File-List file, creates a textfile called Deleted-File-List, compares the Site-File-List file toOld-File-List file and records in the Deleted-File-List any files in theOld-File-List that are not in Site-File-List.

The agent 204 then creates a New-File-List file containing a list offiles that where created or modified since the last update. To createthe New-File-List file, the agent 204 deletes the current New-File-Listfile, creates a new text file called New-File-List, compares the fileSite-File-List to the file Old-File-List and the file Exclude-File-List,and records in the New-File-List file any files in Site-File-List thatare not in the Old-Site-File-List or in Exclude-File-List files.

Next, the agent 204 indexes the corresponding site and creates a newSite-Index file. More specifically, the agent 204 determines if theSite-index file exists, and, if yes, copies the Site-Index file to anOld-Index file. If the Site-Index file does not exist, the agentdetermines if the file Old-Site-Index exists, and if yes copies theOld-Site-Index file to Site-Index file. If Old-Site-Index file does notexist, the agent 204 copies a Sample-Site-Index file to the Site-indexfile.

The agent 204 then creates a New-Records-Index file and aDeleted-Records-List file. The agent 204 next removes records of deletedor modified files from the Site index. More specifically, the agent 204deletes all records from Site-Index for files in New-File-List, deletesall records from Site Index for files in Deleted-File-List, and recordsthe Host IP, URL, and record ID Numbers for each record deleted intoDeleted-Records-List.

The agent 204 then runs an indexing program against all files in theNew-File-List file and creates a record for each new key word, phrase,MP3, Video, Movie, Link and brochure information and adds these to theSite-index file. The agent 204 then copies each new record created tothe New-Records-Index file. If new fields were added to the Site Index,the agent 204 runs the indexing program against all files for new fieldinformation and creates records in Field-Update-Index for allinformation found. The agent 204 then updates the Site-Index file fromthe Field-Update-Index file.

At this point, the Site-Index file has been updated, and the agent 204calculates a digital signature for the Site-Index file. Morespecifically, the agent determines if the Update-Status file exists, andif so opens this file. If the Update-Status file does not exist, theagent 204 creates a text file called Update-Status and opens this file.The agent 204 then calculates the digital signature of the Site Indexfile, and records the Site-index digital signature along with the dateand time in the Update-Status file. Next, the agent 204 calculates thedigital signature of the Site-File-List file, and records theSite-File-List digital signature along with the date and time inUpdate-Status file.

Finally, the agent 204 creates a Site-Map file for the sites serviced bythe agent. More specifically, the agent 204 determines whether theDeleted-File-List or New-File-List contain files, and, if yes, the agentdeletes the Site-Map file. The agent 204 then generates a site map forthe Site-Map file from the Site-File-List. Once the Site-Map file hasbeen generated, the agent 204 sends New-Records-Index andDeleted-Records-List files to the central server 202. More specifically,the agent 204 opens a secure connection and contacts the central server202. The agent 204 then compresses the files to be sent, encrypts thesefiles, and sends the compressed and encrypted files in theNew-Records-Index, Field-Update-Index, Deleted-Records-List, digitalsignature for the Site-Index, Site-Map, and the Site-File-List to thecentral server 202, which the uses these files to update the centralindex. Once the agent 204 has successfully sent this information to theclient server 202, the agent 204 records the digital signature of theSite-Index file, the time of the successful transfer, the date and sizeof the files transferred in the Update-Status file, and thereafterdeletes the sent files. The agent 204 then closes the secure connectionto terminate the update process.

FIG. 6 is a bubble chart illustrating the process executed by the queuemanager 302 of FIG. 3 in queuing update entries and transferring theseentries to the remote queue manager 304. The queue manager 302 receivesupdate entries 600 from the agent update server 222 and update entries602 from the brochure server 228, and places these update entries in anupdate queue 604. The entries in the queue 604 are transferred to aqueue database 606. Once the queue 604 is done receiving update entries600, 602, which may be when the queue is full or at predeterminedintervals, the queue manager 302 goes to step 608 and retrieves thequeue entries from the queue database 606 and sends them to the remotequeue manager 304. As previously described, the update entries stored inthe queue database 606 are thereafter processed by the update processingserver 306 (see FIG. 3) to update the local master index on master indexsever 218 (see FIG. 3). The queue manager 302 also receives a deletionrequest (not shown) from the update processing server 306 and deletesupdate entries stored in queue database 606 in response to this deletionrequest, as will be explained in more detail below with reference toFIG. 7. The queue functions are preferable implemented using acustomized version of the standard UNIX email handlers, where eachinbound email represents a request for a file or for the content of afile.

FIG. 7 is a bubble chart showing the process executed by the updateprocessing server 306. The process begins in step 700 with the updateprocessing server 306 retrieving queue entries 700 from the queuemanager 304. In the embodiment of FIG. 7, the queue entries 702 areretrieved periodically so that in step 700 the queue entries for thelast N hours are retrieved. From step 700, the process proceeds to step704 and the update processing server 306 applies the queue entries tothe master index server 218 which, in turn, utilizes the queue entriesin updating the master index, as previously described. Once the queueentries 702 have been applied to the server 218, the process proceeds tostep 706 and the update processing server 306 applies a deletion request708 to the queue manager 302 (see FIGS. 3 and 6). In response thedeletion request 708, the queue manager 302 deletes the update entriesstored in the queue database 606 that have now been applied to themaster index server 218. The central index on the master index server218 has now been updated to include entries in the queue database 606,so these entries are deleted since they are now reflected in the centralindex and thus no longer needed.

FIG. 8 is a bubble chart illustrating the overall data flow between thesearch engine 202, agent, and brochure components of the active indexingsystem 200. Each aspect of the overall data flow has already beendescribed in a corresponding section above, and thus FIG. 8 will now bedescribed merely to provide a brief description of the overall data flowof the indexing system 200 according to one embodiment of the presentinvention. The components of the process in FIG. 8 may logically brokeninto two functional groups, an indexing group and a searching group. Inthe searching group, a user 800 applies a search request to one of theweb servers 214, which processes the search request and applies it toselected ones of the index servers 216, 217. In response to the appliedsearch request, each of the search index servers 216, 217 queries itscorresponding local index segment 802 and generates search data. Theindex servers 216, 217 then return the search results to the web server214, which, in turn, provides the user 800 with the search resultscorresponding to his applied search request.

The web servers 214 also handle version queries from agents 204 onsource sites. Each agent 204 sends a version check 804 that is processedby one of the web servers 214. In response to the version check 804, theweb server 214 returns the digital signature of the most recent versionof the agent 204, and also supplies the updated version 806 of the agent204 to the source site if an update is required.

The remaining components in the FIG. 8 are in the indexing group. Thequeue manager 302 receives updates from each of the agents 204 and fromthe brochure check server 228, which services sites without an agent 204as previously described. The queue manager makes update and deletions tothe queue database 602 corresponding to the received updates, and alsoprovides a mirror copy of these updates to the remote queue manager 304.The update processing server 306 retrieves the update entries from thequeue manager 302, and applies the updates to the master index servers218. The server 218 updates the master index to include the appliedupdates, and the update processing server 306 then sends a deletionrequest to the queue manager 302 to delete the corresponding entriesfrom the queue database 602.

Once the master index server 218 has updated the master index, theserver updates the segmented index stored on the search index servers216, 217 as previously described. Each of the search index servers 216,217 updates its corresponding portion of the segmented index in responseto the updates from the master index server 218. As previouslymentioned, the entire segmented index stored on the index servers 216 iscontinuously available for processing search requests even duringupdating of the segmented index. The entire segmented index is availabledue to the redundant architecture of the servers 216, 217, as previouslydescribed.

FIG. 9 is a functional block diagram of a distributed search engine 900according to another embodiment of the present invention. The searchengine 900 includes a central search engine 902 connected over a network904, such as the internet, to a plurality of agents 906, each agentbeing resident on a respective server 908. Each agent 906 generates alist of digital signatures related to retrievable information on thecorresponding server 908 and provides these signature to the searchengine 902 which determines which files to access for updating itsindex, as will now be explained in more detail. In the followingdescription, the server 908 is a standard web server, but one skilled inthe art will appreciate that the distributed search engine 900 can beimplemented for a number of other services available on the internet,including but not limited to email servers, ftp servers, “archie”,“gopher” and “wais” servers. Furthermore, although the agent 906 isshown and will be described as being on the web server 908, the agent906 need not be part of the program which processes requests for thegiven service.

In operation, the agent 906 periodically generates a list of signaturesand accessible web pages, which are then stored on the local web server908. The digital signature generated by the agent 906 could be, forexample, an digital signature of each file on the server 908. The listof digital signatures is then transmitted by the agent 906 to the searchengine 902, or the search engine 902 may retrieve the list from theservers 908. A digital signature processing component 910 in the searchengine 902 then compares the retrieved digital signatures against ahistoric list of digital signatures for files on the server 908 todetermine which files have changed. Once the component 910 hasdetermined which files have changed, a spider 912 retrieves only thesefor indexing.

The digital signatures may be stored in an easily accessible file formatlike SGML. Alternatively, the digital signatures could be generateddynamically when requested on a page by a page or group basis. Thiswould insure that the signature matches the current state of the file.In addition, several new commands would be added to the standard httpprotocol. The new commands perform specified functions and have beengiven sample acronyms for the purposes of the following description.First a command GETHSH retrieves the digital signatures for a given URLand sends the signatures to the search engine 902. A command CHKHSHchecks the retrieved digital signature for a given URL against a priordigital signature and returns TRUE if the digital signatures are thesame, FALSE if not the same, or MISSING if the URL no longer exists. Acommand GETHLS retrieves a list of the valid URLs available and theirassociated digital signatures, and a command GETLSH retrieves thedigital signature of the URL list.

Using the above command set, the search engine 902 need not request theentire contents of a page if that page has already been processed.Furthermore, there is no need to “spider” a site. Instead, the webserver 908 provides the valid list of URLs which can then be directlyretrieved. As an example, consider the following programmatical stepsfrom the point of view of a search engine. First, given a web host 908,fetch the digital signature of the URL list. If the digital signaturedoes not match a prior digital signature for the list, fetch the list ofURLs from the web server. Thereafter, compare the list of URLs at theclient web server 908 just retrieved to those stored locally at thesearch engine 902. From this comparison, a list of changed URLs isdetermined. The URLs that have changed are then retrieved and parsed forkeyword and other indexing information. Once the indexing information isobtained, all URL's which do not appear in the retrieved list and theprior list are deleted from the search index on the search engine 902.

From the above description, one skilled in the art will appreciate thatit is not necessary to retrieve all pages on the web site for everyindexing process. Full retrieval of all web pages is necessary only onceor if the entire site changes. This has several effects, the mostimportant being that the amount of information transmitted isdrastically reduced. The above method is but one possible implementationor embodiment. In another embodiment, a list of URLs on the searchengine could be used and the individual checking of web pages done usingthe commands given. For example, the search engine 902 could tell if apage was current by simply retrieving its signature. If current, noother activity is required. Otherwise, the page might be deleted if nolonger present or re-indexed if it has changed.

All content from a single agent/site could be searched for by a peersystem user using the agent name. The search results could then bedisplayed to the user in a dynamically created “home page” for thecontent provider identified by that agent name. The dynamic home pagewould include a listing of every item indexed by the agent with thatagent name and the item titles would be displayed along with theirdescriptions.

In a conventional search engine, the search engine normally requeststhat a web server deliver HTML documents to the search engine,regardless of whether the contents of the page have changed since thelast recursive search. This is wasteful not only of CPU resources, butvery wasteful of bandwidth which is frequently the most valuableresource associated with a web site. Thus, current search engines andcontent directories require regular retrieval and parsing ofinternet-based documents such as web pages. Most search engines use arecursive retrieval technique to retrieve and index the web pages,indexing first the web page retrieved and then all or some of the pagesreferenced by that web page. At present, these methods are veryinefficient because no attempt is made to determine if the informationhas changed since the last time the information was retrieved, and nomap of the information storage is available. For example, a web serverdoes not provide a list of the available URLs for a given web site orseries of sites stored on the server. Secondly and most importantly, theweb server does not provide a digital signature of the pages availablewhich could be used to determine if the actual page contents havechanged since the last retrieval.

Another alternative embodiment of the process just described is theautomated distribution of a single web site across multiple servers. Forexample, a web site would be published to a single server. Periodically,a number of other servers would check the original server to see if anypages have been added, removed or changed. If so, those pages would befetched and stored on the requesting server. Another alternativeembodiment is the construction of meta indexes generated as lists ofURLs from many different web servers. Such a meta index would be usefulas a means of providing central directory services for web servers orthe ability to associate sets of descriptive information with sets ofURLs. The method could also be used to create directory structure mapsfor web sites, as will be appreciated by one skilled in the art.

The indexing system 200 may be used not only on the globalcommunications network but on corporate Intranets as well. A typicalcorporate intranet includes a central location, such as a corporateheadquarters, at which a central searchable database is maintained, anda number of remote locations, such as regional offices or stores,coupled to the central location through a network of intranet. Eachremote location transfers data to the central location for storage inthe central database. The remote locations may also search the centraldatabase for desired information.

In transferring data from each remote location, data is typically storedat the remote location and then transferred to and replicated at thecentral location. One of four methods is generally used to update thecentral database, as previously discussed above under the Backgroundsection. First, all remotely stored data is copied over the intranet tothe central location. Second, only those files or objects that havechanged since the last transfer are copied to the central location.Third, a transaction log is kept at the remote location and transmittedto the central location, and the transaction log this then applied atthe central location to update the central database. Finally, at eachremote location a prior copy of the local data is compared to thecurrent copy of the local data to generate a differential recordindicating changes between the prior and current copies, and thisdifferential record is then transferred to the central location andincorporated into the central database.

Each of these methods relies on duplicating the remote data, which canpresent difficulties. For example, redundant hardware at the remote andcentral locations must be purchased and maintained for the storage andtransfer of the data over the intranet. Data concurrency problems mayalso arise should transmission of differential data from the remotelocations to the central location be unsuccessful or improperly appliedto the central database. Furthermore, if the intranet fails, alloperations at remote locations may be forced to cease untilcommunications are reestablished. A further difficulty is the author'sloss of authority over his document and the responsibility for retentionand data management decisions. In a centralized intranet, unregulatedretrieval of objects from the central database to local storage cancreates version control problems. Difficulty in handling revisions to anobject may also arise in such a centralized system, with simultaneousrevision attempts possibly causing data corruption or loss. Finally, incentralized system the size of the central database can grow to thepoint where management of the data becomes problematic.

With the architecture of the indexing system 200, everything, includingeach field in a local database, is treated as an object. Instead ofcopying each object to a central location, an object reference iscreated at each local site and sent to a cataloging location orlocations. The objects are not duplicated in a monolithic centraldatabase. One advantage to this architecture is that the decision ofwhether to expose the existence and classification of local objectsbecomes the responsibility and choice of the author, rather than ageneric decision. In the system 200, the implementation of retentionrules and the physical location of the objects remain with the author.The searchable central catalog merely references the distributedobjects, eliminating the need to make full copies and therefore manage alarge storage system. Each local site generates and transfersinformation to the central server 202, or to a plurality of centralservers for use in a searchable catalog.

FIGS. 10 and 11 are diagrams illustrating operation of a distributedaccounting and inventory system on an intranet 1000 according to oneembodiment of the present invention. In FIG. 10, the intranet 1000includes three different physical locations 1002, 1004, and 1006including catalogs 1008, 1010, and 1012, respectively. Each location1002-1006 also includes a source of objects (not shown in FIG. 10) thatcorresponds to an inventory of items at that location. The sourcesobjects or sources for the locations 1002, 1004, 1006 are designatedsources 1002, 1004, and 1006, respectively, in records of the respectivecatalogs 1008-1012. In the example of FIG. 10, the source 1006 is empty(i.e., no inventory items at location 1006). Each of the catalogs1008-1012 is a catalog of object references to objects in the source atthe corresponding location and to objects at the other locations. Forexample, the catalog 1010 at location 1004 includes a record for partno. 1, which is part of the inventory or source 1004 at this location.The catalog 1010 further includes an object reference, as indicated bythe arrow 1014, for part no. 3, which is part of the inventory or source1008 at location 1002. The catalog 1010 does not store a duplicate copyof the information in the record for part no. 3, but instead merelystores a reference to that object.

FIG. 11 is another diagram of the intranet 1000 expressly illustratingthe sources 1002-1006 on the locations 1002-1006, respectively. Thesource 1006 is shown as containing no objects, such as may be thesituation where the location 1006 is at a headquarters of a corporation.The sources 1002 and 1004 each include objects or inventory items, suchas where these locations are remote offices of the corporation. Thisexample illustrates that records for objects are not duplicated on eachlocation 1002-1006, but instead object references in each of thecatalogs 1008-1012 point to objects stored in remote sources.

The intranet 1000 provides several advantages in accounting or inventorycontrol applications, and others. A conventional intranet requires thecentralization of the catalog for purposes of control. The intranet 1000separates the control of the physical inventory (objects in the sources1002-1006) from accounting control. Since the whole intranet includesonly objects and object references, then central reporting and planningcan occur to the location 1006, but such reporting merely corresponds todata being read from the remote locations 1002, 1004, and no data ismodified. In the intranet 1000, each location 1002-1006 functions asboth a server and a client, and minor latency between the locations isnot critical because within each location accounting and physicalcontrol remain linked. Latency need be considered only where authorityto sell or transfer inventory (objects in the sources 1002-1006) isseparate from the physical control of the inventory.

With the intranet 1000, the author of an object has physical controlover that object and thus may decide what objects are to be exposed forsearching by other locations. As a result, the intranet 1000 is wellsuited for high-security management systems that typically requireelaborate security procedures to prevent unauthorized duplication ofdata. For example, assume there are 200 remote information generators(offices, salespeople, etc.). With the intranet 100, data access toinformation in the objects is maintained through the use of thereferences available to both the central location and the remote.

The intranet 1000 also provides a more effective means to organize anddescribe organizational data, creating a much more flexible environmentfor data retention handling. A data retention handling system has twoprimary goals: 1) eliminate obsolete data to prevent confusion withcurrent data and reduce storage requirements; and 2) reduce liability.Typically, hierarchical storage management (“HSM”) systems have beenused for these purposes. A HSM system stores frequently-used orrelatively new files on high-speed, immediately available, and mostexpensive storage media. Older files or files that are not as frequentlyused are stored on “near-line” storage media that may consist ofautomatically mounted tape drives or CD-ROMs. Old files or files thatare almost never used are stored off-line on tape or other inexpensivehigh-capacity media. Some files may eventually be deleted if they fallwithin certain parameters of usage, type, or age. The intranet 1000overcomes these potential difficulties of a HMS system. For example, inthe intranet 1000 duplicate copies of records are not maintained at eachlocation, thereby eliminating the need for hierarchical storage media toprovide the required access to stored records.

The agent 204 may also generate ratings for objects stored on theassociated sites so that users may filter their searches based upon thegenerated ratings. For example, in one embodiment, an owner of a website provides a rating of his site, such as a “G,” “R,” or “X” rating.In addition, the web host, on which the agent 204 runs, also provides arating that the host believes applies to the site. The agent 204 thenparses the pages on the site and looks for adult content “trigger”words, such as “XXX” or “XXX-Rated.” If the agent 204 finds enoughoccurrences of such trigger words, the agent “flags” the web site forreview to determine the correct rating for the site. To rate the site,the agent 204 compares the words in the web pages to words in a list ofratings values. The list of ratings values may be, for example, wordsthat are generally found on adult web sites, such as profane andsexually explicit words. The list of ratings values may be generated bya human or may be automatically generated by the agent 204. Toautomatically generate the list, the agent 204 could, for example, parseknown adult web sites. Such known adult web sites could be identified bydetermining those sites in the catalog that include the phrases “adultcontent” or “X-rated.” Once these sites are identified, the agent parsesthe pages and determines frequently used words on such pages, and mayalso determine the frequency with which such words occur on these pages.The frequently used words and associated frequencies are then compiledto form the list of ratings values. After flagging web sites for review,the review may be either through human review of the web site or throughautomated review performed by the agent 204. In automated review offlagged web sites, the agent 204 could, for example, determine thefrequency of occurrence of words in the list of ratings values, and thenset the rating of the web site as a function of the frequency. Forexample, if the frequency is greater than some threshold T1, the website is rated “R,” and if greater than a second threshold T2, whereT2>T1, the site is rated “X.”

One proposed system for rating web pages on the Internet is described inA Best Practices Model by Members of the Information Society Project atYale Law School, J. M. Balkin, Beth Simone Noveck, Kermit Roosevelt(Jul. 15, 1999), which may be found athttp://webserver.law.vale.edu/infosociety/. In this proposed system,three layers are implemented to provide for rating web pages. The firstlayer includes a basic vocabulary of, for example, thirty to sixty termsthat are used in rating a web page by a first party, typically the siteowner containing the web page. The second layer includes ratingtemplates developed to reflect a particular ideology. Third parties,such as the NAACP or Christian Coalition, would develop such templatesto reflect a particular value system. The templates would include termsin the basic vocabulary being categorized and scalar values assigned toeach item to reflect the value system. Finally, in layer threeindividuals could customize or modify a template to suit theirindividual values. For example, a template developed by the ChristianCoalition could be further modified to include scalar values for websites designated as racist by the NAACP.

The indexing system 200 could utilize such a rating system to performfiltering of search results at the central server 202. For example,user's browsers could be registered with the central server 202, andpart of this registration would include selection of a template and anydesired modifications to the selected template. Thereafter, whenever theuser's browser applies a search query to the central server 202 thebrowser registration is identified and the search results generated inresponse to the query are “filtered” according to the template and anytemplate modifications associated with the registered browser.

The indexing system 200 also may perform adult-content locking. Inconventional search engines, adult-content web sites are automaticallyprovided in response to applied search queries. The only way for a userto filter adult-content is through a filter on his browser. Thus,current search engines are “opt-in” only in that the search engine doesnot preclude adult-content pages from being returned in response toapplied search queries. Conversely, in one embodiment of the indexingsystem 200, the user is automatically opted out of receivingadult-content web pages in response to applied search queries. The usermust reverse this default “opt-out” status and elect receiveadult-content web pages in the system 200. This could be done, forexample, by registering a browser with the system 200 so that when theregistered browser is identified adult-content web sites will bereturned in response to applied search queries. Alternatively, a machinelevel lock using the computer or machine identification, such as the CPUor Windows identification number, could be utilized. In this approach,regardless of the browser being utilized on the computer, adult-contentis either returned or not returned in response to applied searchqueries. This approach may be particularly desirable for parents whowant to preclude their children from accessing adult-content since achild cannot merely use a new browser on the same machine and therebycircumvent the filter the parent has on his or her browser.

The indexing system 200 may also perform ranking of web pages havingreferences in the central index. First, the agent 204 may performpositional and contextual rankings for particular words in the web pageson a site. The positional rankings assign a ranking value to a wordbased upon, for example, the location of the word in the web page andthe position of the word relative to other words in the page. Thecontextual ranking is determined using contextual information about thesite contained in the brochure 206. For example, if a word in a web pagecorresponds to a category as listed in the brochure 206, the word willbe assigned a higher ranking. In addition to rankings generated by theagent 204, the central server 202 also generates rankings for thecentral index. For example, the central server 202 may generate rankingsbased upon whether a page is a source or reference to the desired data.Rankings may also be determined based upon user input such as the usageor popularity of a site as measured by how often the site is linked asthe source site in other sites, or through positive comments entered byusers about the context or ranking of a site. All the methods of rankingjust described are know as static rankings, meaning that the ranking isdetermined before a particular search query is applied.

In addition to static rankings at the central server 202, the centralserver may also perform dynamic ranking of search results. Dynamicrankings are a function of the applied search query, and are notpredetermined and independent of the query. For example, if the appliedsearch query is “red barn,” the word “barn” is probably more importantthan “red” so search results including the word “barn” will have theirranking increased relative to those containing only the word “red.”Furthermore, ratings could be applied to search queries to createanother type of dynamic ranking at the central server 202. Finally, auser may select which ones of the previous methods of rankings should beapplied in ranking search results generated in response to his appliedquery. For example, a user could specify that his search results are tobe ranked only on the basis of popularity, or only on the basis ofpositional and contextual rankings and the applied search query. Anotherexample for the use of dynamic ranking is using the information in thebrochure 206, the search results can be ranked dynamically based on thegeographic distance from the searcher.

The server architecture of the system 200 will now be described. Theserver architecture provides a number of services which support themanagement and use of index information. The system is divided intoseveral components which can be run on different machines, as needed, ina truly distributed architecture. The design must scale well and beself-healing wherever possible. To make this possible, Jini technologyplays an important role in the architecture and services are exposedusing that infrastructure. As components are brought online, theyadvertise that their existence to the local Jini lookup service. Thisinformation is automatically propagated to services who need access toother services and handshaking brings elements into the Jini communityas they are announced. If non-critical parts of the system becomeunavailable, the system is able to compensate by distributing load toother machines hosting the necessary services.

A load balancer allows round-robin distribution of incoming traffic toweb servers and the agent listener. The web servers provide userservices like account registration, agent downloads, brochuremanagement, and search capabilities. The AgentListener is a securesocket listener that manages agent connections. One of the components isa UserAccessService, which controls access to the BrochureService. Userscan make queries on the search index. These are handled by theQueryDispatchManager, which delegates subqueries to appropriateIndexSegmentServices. Incoming information from agents is added to theMessageQueueService and popped off by the UpdateManagerService, whichcoordinates information in the BrochureService to ensure we have thelatest updates. Agent-collected changes are added and/or removed in theMasterIndexService.

FIG. 20 shows request/response flow with the direction of arrows. Theintent is to make clear who is asking for the execution of respectiveservices. The web server, serving up static and dynamic content throughServlets and Java Server Pages, can communicate with theUserAccessService, BrochureService and the QueryDispatchService, butnothing else. The AgentListener can talk to the UpdateManagerService andthe MessageQueueService only. An IndexSegmentService is able toinitialize itself by asking from information from theMasterIndexService. Finally, the UpdateManagerService can talk to theBrochureService, MessageQueue service and the MasterIndexService. Itsjob is to keep the MasterIndexService up to date by processing incomingagent messages.

Because we are using Jini, the order in which services are brought upcan determine which other services can operate, but does not restrictthat order in any way. If an UpdateManagerService is unavailable, forexample, the system will not process updates from the message queue, butprocessing will resume as soon as the UpdateManagerService is brought upagain. As long as more than one instance of a given service isavailable, the system can discover those services automatically, as theyare brought online. An IndexSegmentService is associated with a givenIndexSegmentRange, which determines the prefix character range for theindex content.

When an IndexSegmentService is brought online, it automatically becomesavailable to the QueryDispatchService. If one of these services arereinitialized periodically, the update will be completely transparent,so long as other IndexSegmentService cover the same IndexSegmentRange.This might be a single server or may be distributed arbitrarily across anumber of IndexSegmentService instances. So long as aQueryDispatchService instance is available to the web servers, andsufficient IndexSegmentService instances are available to cover the fullrange of possible tokens, the system is capable of executing queries.

The data structures are critical to the correct operation of a complexsystem. The following description outlines the more important structuresthat represent the means by which subsystems may interact or store theirinformation persistently in the system 200.

Persistent information is stored in a database or in temporary files onthe system 200. The database tables relate to each other as shown inFIG. 21.

The packages presented in FIG. 22 are directly associated with services,components, or conceptual groupings in the system 200. Major servicesare represented by their own package, with supporting classes included.Components are given separate packages where applicable. Some servicesand components accomplish the same tasks and are, naturally, in the samepackage. Supporting classes, such as database, networking and servletsare grouped into conceptual packages for clarity.

Note that the packages are currently presented in alphabetical order,but may be reorganized in a later revision to reflect the three tierednature of the architecture of the system 200. Low level utility packagesshould be listed first, followed by component/manager packages, Jiniservice packages, and finally independent applications.

In FIG. 23, packages are categorized in three ways. They are eitherlow-level utility packages, components, applications and services oruser interface elements. Support packages, like the database, catalog,html and xml packages, provide a foundation for other programfunctionality. A few of the services, the message and index services,for example, are grouped as shared because several of their classesprovide functional capabilities between both the agent and serverelements. The brochure package is also shared. The application andservice level packages construct the agent and the server-side Jiniservices. Taken together, the classes in these packages functiontogether as a complete, integrated, distributable system.

Referring to FIG. 23, user interface elements are grouped into thefollowing packages. The com.activeindexing.ui.app package containsclasses related to console-based interfaces.

The com.activeindexing.ui.app package contains classes related toweb-based user interfaces and contains classes related to applicationuser interfaces.

The agent 204 has its own package as shown in FIG. 24. The agent 204 hasits own package primarily for distribution reasons.

The agent package, com.activeindexing.agent contains classes related tothe host agent.

Referring to FIG. 23, the collection of server of packages provides highlevel server-side Jini services to the system.

FIG. 25 illustrates the com.activeindexing.server.access packagecontains, which classes related to the UserAccessService.

The com.activeindexing.server.database package of FIG. 23 containsclasses related to database access and record handling and is shown inmore detail in FIG. 26.

Referring to FIG. 23, the com.activeindexing.server.query packagecontains classes related to the QueryDispatchService, as shown in moredetail in FIG. 27.

The com.activeindexing.server.servlet package contains classes relatedto Servlets and web servers, as shown in more detail in FIG. 27.

The com.activeindexing.server.update package of FIG. 23 contains classesrelated to the update manager, as shown in more detail in FIG. 28.

Referring to FIG. 23, the shared package contains elements which can actas components within the system, used by one or more services orapplications.

The com.activeindexing.shared.brochure package is shown in more detailin FIG. 29 and contains classes related to Brochure handling.

The com.activeindexing.shared.index package of FIG. 23 contains classesrelated to indexing and includes the IndexSegmentService as shown inmore detail in FIG. 31.

The com.activeindexing.shared.message package of FIG. 23 containsclasses related to the MessageQueueService as shown in more detail inFIG. 32.

The com.activeindexing.shared.rating package of FIG. 23 contains classesrelated to rating systems, as shown in more detail in FIG. 33.

The com.activeindexing.shared.schedule package of FIG. 23 containsclasses related to the ScheduleManager, as shown in FIG. 34 in moredetail.

The com.activeindexing.shared.signature package of FIG. 23 containsclasses related to the file signatures and hash calculations, as shownin more detail in FIG. 34.

The com.activeindexing.shared.validate package of FIG. 23 containsclasses related to field validation, as shown in more detail in FIG. 35.

Referring to FIG. 23, the document-related packages,com.activeindexing.doc.html, contains classes related to HTML tokenizingand parsing, as shown in more detail in FIG. 36.

The com.activeindexing.doc.report package of FIG. 23 contains classesrelated to reporting, as shown in more detail in FIG. 37.

The XML package of FIG. 23, com.activeindexing.doc.xml, contains classesrelated to XML file management as shown in more detail in FIG. 38.

The utility package of FIG. 23 contain low-level utility packages whichcan be used by any other package.

The config package, com.activeindexing.util.config, contains classesrelated to configuration file handling, as shown in more detail in FIG.39.

The I/O package of FIG. 23, com.activeindexing.util.io, contains utilityclasses related to input/output operations as shown in more detail inFIG. 40.

The jini package of FIG. 23, com.activeindexing.util.jini, containsclasses related to Jini services as shown in more detail in FIG. 41.

The log package of FIG. 23, com.activeindexing.util.log, containsclasses related to the log files, as shown in more detail in FIG. 42.

The network package of FIG. 23, com.activeindexing.util.net, containsutility classes related to networking, as shown in more detail in FIG.43.

The snmp package of FIG. 23, com.activeindexing.util.snmp, containsclasses related to the Simple Network management Protocol, as shown inmore detail in FIG. 44.

The above description does not include user interface, the XMLsubsystem, transactions for change requests, or a message format, butone skilled in the art will understand suitable implementations for eachof these components.

FIG. 45 is a functional data flow diagram illustrating an alternativeembodiment of the central cataloging site of FIG. 2. In FIG. 45, a webserver 4700 is the main gateway for all agent 204 program updaterequests, agent program downloads, and search requests. An update batchprocessor 4702 receives, stores, and applies update batches created byremote agents 204, and also transmits copies of the batches to redundantremote catalog sites. A remote update batch processor 4704 receives, andapplies batches received from a master catalog site to a local indexserver for the purposes of redundancy. An index server 4706 stores allsearch index information in a series of database segments, and createsresult sets from queries applied to it as a result of search requestsreceived by the web server 4700.

The system of FIG. 45 includes an agent program storage area 4708containing copies of agent 204 programs and the digital signatures ofthose programs for the various host operating systems which use agentsto generate web site updates. An update batch storage area 4710 containsthe received update batches transmitted by agent programs 204 on remotehosts, and these batches are deleted after processing. An index segmentstorage area 4712 contains a subset of the total index database for theindex server 4706. For example, a single segment might contain thekeyword fields for all of the keywords beginning with the letter “A”.Typically, these storage areas will be placed on high-speed RAID storagesystems. An index segment storage twin area 4714 is identical to thestorage area 4712. The purpose of the twin area 4714 is to provideaccess to existing index information while the corresponding indexsegment storage area is being updated. This permits updates to beapplied to a segment without requiring record locking. The index server4706 is simply notified as to which segment areas are available forsearch processing. Once updated, the area 4712 or 4714 becomes availableagain. An index signature storage area 4716 that stores the currentdigital signature of the index for a particular site serviced by anagent 204 on a remote host.

In operation of the system of FIG. 45, the agent program, upon startingon a remote host, will query the web server 4700 to determine if thelocal agent program digital signature matches that of the agent programdigital signature stored at the catalog site. If the local agent 204program determines that the digital signatures of the agent programs donot match, the agent program will retrieve a new copy of itself from theweb servers 4700 and restart itself after performing the appropriatelocal operations. Before commencing local processing, the agent program204 checks the digital signature of the existing site index on thecatalog site with the digital signature of the site stored locally. Ifthe two signatures match, a differential transmission of cataloginformation will occur. Otherwise, the entire catalog will beregenerated and transmitted, and the catalog site will be instructed todelete any existing catalog entries for the site. Once a differential orfull catalog update has been generated, the agent program 204 contactsthe update batch processor 4702 at the catalog site and transmits thecontents of the update. Upon receiving confirmation of receipt, theagent program 204 performs clean up and post-processing operations, thensuspends itself until the next processing cycle.

The update processor 4702 periodically updates the index segments on theindex server 4706. All updates received are applied as batches to retaindata integrity on the index server 4706. The update processor 4702separates update information as required to match the segments on theindex server 4706, then updates each segment storage area 4712 and eachsegment storage twin area 4714. While a segment storage area 4712, 4714is being updated, its counterpart is available for search requestprocessing. Once all updates have been applied, the digital signature ofthe index for the site is updated in the index signature storage area4716 and the batch is deleted from the update batch storage area 4710.

In processing search requests, the web servers 4700 receive andinterpret the search requests from remote portals or web browsers. Eachsearch request is preprocessed to divide the request into sub-requestsas required for each index segment, then the index server 4706 isrequested to perform search queries on each relevant segment. More thanone index segment may be queried simultaneously. The index server 4706determines which index segment storage areas 4712, 4714 are availablefor use, applies the search request, and transmits the results to theweb server 4700 which, in turn, collects and collates all search resultsand transmits these results back to the requesting system in a formattedmanner.

According to another embodiment of the agent 204, the agent calculates avalue representing the distance and text between objects and therebydetermines which objects at a site are most likely to relate to eachother. At the catalog site, these relationship values are combined withthe relationship values from other sites to create a relationship valuetable. This relationship value table represents the likelihood of anobject occurring together with another object. This table may be used torefine searches and create relevance ranking.

It is to be understood that even though various embodiments andadvantages of the present invention have been set forth in the foregoingdescription, the above disclosure is illustrative only, and changes maybe made in detail, and yet remain within the broad principles of theinvention.

Therefore, the present invention is to be limited only by the appendedclaims.

1. A computer system for peer-to-peer file sharing comprising: a servercomputer having an index of information about files that reside on aplurality of peer computers, the server computer selectively coupled tothe plurality of peer computers by a network; a plurality of agentprograms running on at least two of the plurality of peer computers andoperable to transmit index information to the server computer andoperable to transmit a requested file; and a file sharing programrunning on the server computer and operable to perform centralizedsearches based on the index information transmitted to it across thenetwork by the agent programs, to store peer computer file requestsidentifying a requested file transmitted across the network by arequesting peer computer of the plurality of peer computers, therequested file being referenced in the index information, and to directa peer-to-peer computer connection for the purpose of exchanging filesbetween the requesting peer computer and a storing peer computer storingthe requested file upon detection of connection of the storing peercomputer to the server.
 2. The computer system of claim 1, wherein theserver computer is also a peer computer.
 3. The computer system of claim1, wherein the contents of at least some of the files stored on a peercomputer comprise nontextual data having meta data that comprises one ormore vectors extracted from the files.
 4. The computer system of claim1, wherein each agent program creates the index information for selectedfiles on each corresponding peer computer, the selected files selectedby user input.
 5. The computer system of claim 1, wherein each agentprogram creates the index information for selected files on eachcorresponding peer computer, the selected files selected by a computeralgorithm.
 6. The computer system of claim 1, wherein the agent programcomprises a utility program that is resident in an operating system onthe corresponding peer computer.
 7. The computer system of claim 1,wherein the file sharing program comprises a utility program that isresident in an operating system on the corresponding server computer. 8.The computer system of claim 1, wherein the file sharing program isfurther operable to direct transfer of the requested file from thestoring peer computer to the server upon detection of connection of thestoring computer to the server when the requesting peer computer is notconnected to the server, the file sharing program operable to transmitthe requested file from the server to the requesting peer computer upondetection of connection of the requesting peer computer to the server.9. A computer system for peer-to-peer file sharing comprising: a servercomputer storing an index of information about files that reside on aplurality of peer computers, the server computer selectively coupled tothe plurality of peer computers by a network; wherein the servercomputer stores a file sharing program that is operable to: performcentralized searches based on index information transmitted to it acrossthe network by agent programs running on two or more of the plurality ofpeer computers, each agent program operable to transmit indexinformation regarding one or more files to the server computer andfurther operable to transmit one of the one or more files to a peercomputer in response to a request from that peer computer; store filerequests that identify a requested file and that are transmitted acrossthe network by a requesting peer computer of the plurality of peercomputers, the requested file being referenced in the index ofinformation; and direct a peer-to-peer computer connection for thepurpose of exchanging files between the requesting peer computer and astoring peer computer storing the requested file, wherein saidpeer-to-peer computer connection is directed upon detection ofconnection of the storing peer computer to the server.
 10. A system,comprising: a server system storing a central index of information forfiles that reside on a plurality of peer computers and that areavailable for peer-to-peer file sharing, wherein the server system isselectively coupled to the plurality of peer computers by one or morenetworks, wherein at least two of the plurality of peer computers havean agent program running on them that is operable to: transmit indexinformation to the server system to update the central index; andtransmit a requested file to a peer computer system to facilitatepeer-to-peer file sharing; wherein the server system is configured toexecute a file sharing program that is operable to: in response toqueries, perform searches for files based on the central index; storefile requests from requesting peer computers of the plurality of peercomputers, wherein the requested files are referenced in the centralindex, and wherein the stored file requests include a first request froma first of the plurality of peer computers for a first file stored by asecond of the plurality of peer computers; and upon detecting that thesecond peer computer is connected to the server computer, direct apeer-to-peer connection between the first and second peer computers toexchange the first file.
 11. The system of claim 10, further comprisingthe plurality of peer computers.
 12. The system of claim 10, wherein theserver system is one of the plurality of peer computers.
 13. The systemof claim 10, wherein the second peer computer is located behind afirewall, and wherein the peer-to-peer connection includes the secondpeer computer sending a message to the first peer computer with therequested first file as an attachment to the message.
 14. The system ofclaim 10, wherein the first file is exchanged anonymously between thefirst and second peer computers.
 15. The system of claim 10, wherein theone or more networks include the Internet.
 16. The system of claim 10,wherein the one or more networks include an intranet.
 17. The system ofclaim 10, wherein the server system includes one or more computersystems.
 18. The system of 10, including a secondary DNS systemconfigured to map names of the plurality of peer computers to IPaddresses of the plurality of peer computers.
 19. The system of claim10, wherein the plurality of peer computers includes mobile devices. 20.The system of claim 10, wherein the file sharing program is operable to:receive the first file at the server system from the second peercomputer when, upon the second peer computer being detected as connectedto the server system, the first peer computer is detected as not beingconnected to the server system; and upon the first peer computersubsequently re-connecting to the server system, transmit the first filefrom the server system to the first peer computer.
 21. A server systemselectively coupled to a plurality of peer computers via one or morenetworks, wherein the server system stores program instructionsexecutable to: maintain a central index of files stored on various onesof the plurality of peer computers and that are available forpeer-to-peer file sharing; receive, from peer computers within theplurality of peer computers, requests for files referenced in thecentral index and stored on various ones of the plurality of peercomputers; store one or more of the received requests that are for filesstored on peer computers that are not currently connected to the serversystem; detect that a first peer computer has connected to the serversystem, the first peer computer storing a first file that is the subjectof a stored file request from a second peer computer; in response tosaid detecting, direct an anonymous peer-to-peer connection between thefirst and second peer computers to share the first file.
 22. The serversystem of claim 21, wherein the server system stores programinstructions executable to: after detecting that the first peer computerhas connected to the server system, determine that the second peercomputer is not currently connected to the server system; in response todetermining that the second peer computer is not currently connected tothe server system, store the first file at the server system untildetecting that the second peer computer has re-connected to the serversystem; and transfer the first file to the second peer computer uponsaid detecting that the second peer computer has re-connected to theserver system.
 23. The server system of claim 21, wherein the first peercomputer is located behind a firewall, and wherein the peer-to-peerconnection includes the first peer computer sending an email message tothe second peer computer, wherein the email message includes the firstfile as an attachment.
 24. The server system of claim 21, wherein theserver system is coupled to a system of computers configured to mapnames of the plurality of peer computers to addresses of the pluralityof peer computers.
 25. The server system of claim 21, wherein the serversystem is a distributed system.
 26. The server system of claim 21,wherein the server system stores a plurality of files available forpeer-to-peer file sharing with the plurality of peer computers.
 27. Amethod, comprising: maintaining, at a server system, a central index offiles stored on a plurality of peer computers coupled to the serversystem via one or more networks; the server system receiving, from afirst of the plurality of peer computers, a request for a first filereferenced by the central index, wherein the first file is stored by asecond of the plurality of peer computers that is currently notconnected to the server system; upon detecting that the second peercomputer has connected to the server system, the server system directinga peer-to-peer connection between the first and second peer computers toshare the first file.
 28. The method of claim 27, wherein thepeer-to-peer connection is anonymous and made via the Internet.
 29. Themethod of claim 27, further comprising accessing a secondary domainsystem configured to map names of the plurality of peer computers totheir corresponding addresses.
 30. The method of claim 27, furthercomprising: upon determining that the first peer computer is notcurrently connected to the server system: directing the second peercomputer to transmit the first file to the server system; andtransmitting the first file from the server system to the first peercomputer when the first peer computer re-connects to the server system.31. A computer-readable memory medium storing program instructionsexecutable by a server system to: maintain a central index of filesstored on a plurality of peer computers; receive, from a first of theplurality of peer computers, a request for a first file referenced bythe central index, wherein the first file is stored by a second of theplurality of peer computers that is currently not connected to theserver system; upon detecting that the second peer computer hasconnected to the server system, direct a peer-to-peer connection betweenthe first and second peer computers to share the first file.
 32. Thecomputer-readable memory medium of claim 31 , further storing programinstructions executable by the computer system to: upon detecting thatthe second peer computer has connected to the server system anddetecting that the first peer computer is not connected to the serversystem: direct the second peer computer to transmit the first file tothe server system; when the first peer computer re-connects to theserver system, transmit the first file from the server system to thefirst peer computer.
 33. The computer-readable memory medium of claim 32, wherein the peer-to-peer connection is anonymous and made via a publicnetwork.
 34. A method, comprising: a first of a plurality of peercomputers connecting to a server system, the first peer computer storinga first file that has been requested by a second of the plurality ofpeer computers while the first peer computer was not connected to theserver system, wherein the first file is referenced by a central indexstored by the server system; upon the first peer computer connecting tothe server system, the first peer computer receiving a request for thefirst file from the server system; responsive to the request from theserver system, the first peer computer sharing the requested first filewith the second peer computer over a public network.
 35. The method ofclaim 34, wherein said sharing includes the server system transmittingthe request first file to the second peer computer if the second peercomputer is not connected to the server system when the first peercomputer connects to the server system.
 36. The method of claim 34,wherein said sharing is anonymous.
 37. The method of claim 34, whereinthe first and/or second peer computers are mobile devices.
 38. A method,comprising: a first of a plurality of peer computer systems requesting afirst file from a server system, wherein the server system stores acentral index of files residing on various ones of the plurality of peercomputer system, wherein the first file resides on a second of theplurality of peer computer systems that is not currently connected tothe server system, and wherein the request for the first file is storedby the server system in response to the second peer computer system notcurrently being connected; upon the second peer computer systemconnecting to the server system and receiving direction from the serversystem to share the first file, the first peer computer system receivingthe first file.
 39. The method of claim 38, wherein when the first peercomputer system is not connected to the server system when the secondpeer computer system connects, said receiving occurs upon the first peercomputer system re-connecting.
 40. The method of claim 38, wherein thefirst and second peer computers are each connected to the server systemvia a respective communication path that includes a public network.